Social Media Timeline Design - HLD Architecture ๐ฐ
Core Problem Statementโ
Challenge: Generate personalized timelines for billions of users in real-time while handling celebrity posts that can reach millions of followers instantly.
Scale Requirements:
- Reads: 100:1 read-to-write ratio (users scroll more than they post)
- Latency:
<200mstimeline load time - Throughput: Millions of timeline requests per second
- Storage: Billions of posts with complex relationships
1. Timeline Generation Strategiesโ
Three Core Approachesโ
1.1 Pull Model (Fan-out on Read)โ
Timeline Request Flow:
User Request โ Timeline Service โ Query Following List โ Fetch Recent Posts โ Rank & Merge โ Return Timeline
Characteristics:
โโโ Real-time content (always fresh)
โโโ Low storage overhead
โโโ High compute cost per read
โโโ Slow response times (complex queries)
Best For:
โโโ Celebrity accounts (millions of followers)
โโโ Less active users
โโโ Systems prioritizing storage efficiency
โโโ Users following many inactive accounts
1.2 Push Model (Fan-out on Write)โ
Post Creation Flow:
New Post โ Get Followers List โ Push to All Follower Timelines โ Store in Timeline Cache
Timeline Request Flow:
User Request โ Timeline Service โ Fetch Pre-computed Timeline โ Return Results
Characteristics:
โโโ Ultra-fast reads (pre-computed)
โโโ High storage cost (duplicate data)
โโโ Expensive writes for popular users
โโโ Potential stale data during high activity
Best For:
โโโ Regular users (<10K followers)
โโโ Active users who read frequently
โโโ Real-time systems prioritizing read speed
โโโ Mobile apps needing fast loading
1.3 Hybrid Model (Mixed Strategy)โ
Decision Logic:
if (user.followers_count > CELEBRITY_THRESHOLD) {
use PULL_MODEL
} else {
use PUSH_MODEL
}
Timeline Assembly:
โโโ Pull celebrity posts on-demand
โโโ Merge with pre-computed timeline cache
โโโ Apply personalization ranking
โโโ Return unified timeline
Characteristics:
โโโ Balanced performance and cost
โโโ Complex implementation
โโโ Optimal for mixed user bases
โโโ Scalable across user segments
2. Celebrity Problem Deep Diveโ
The Celebrity Challengeโ
Problem: When a celebrity with 50M followers posts, the fan-out on write approach would:
- Create 50M timeline entries instantly
- Overwhelm the write system
- Cause massive storage overhead
- Delay post visibility due to processing time
Celebrity Handling Strategiesโ
2.1 Celebrity Detectionโ
Celebrity Classification:
โโโ Static Thresholds
โ โโโ >1M followers = Celebrity
โ โโโ >100K followers = Influencer
โ โโโ <100K followers = Regular user
โ โโโ Verified accounts = Celebrity
โโโ Dynamic Classification
โ โโโ Follower growth rate
โ โโโ Engagement velocity
โ โโโ Post reach metrics
โ โโโ Media mentions
โโโ Manual Override
โโโ News organizations
โโโ Government accounts
โโโ Brand accounts
โโโ Emergency services
2.2 Celebrity Timeline Architectureโ
Celebrity Post Flow:
1. Celebrity posts content
2. Store in Celebrity Content Service
3. Mark as "celebrity post" in metadata
4. Skip fan-out process initially
5. Index for real-time retrieval
User Timeline Request (with celebrity content):
1. Fetch pre-computed timeline (regular follows)
2. Query Celebrity Content Service for followed celebrities
3. Merge celebrity content with regular timeline
4. Apply ranking algorithm
5. Return unified timeline
2.3 Celebrity Content Optimizationโ
Multi-Tier Celebrity Handling:
โโโ Tier 1: Mega-celebrities (>10M followers)
โ โโโ Never fan-out on write
โ โโโ Always pull on read
โ โโโ Dedicated celebrity content servers
โ โโโ Special caching strategies
โโโ Tier 2: Large influencers (1M-10M followers)
โ โโโ Selective fan-out to active followers only
โ โโโ Lazy loading for inactive followers
โ โโโ Time-delayed fan-out processing
โ โโโ Batch processing optimizations
โโโ Tier 3: Medium influencers (100K-1M followers)
โโโ Standard fan-out with rate limiting
โโโ Async processing of fan-out
โโโ Priority queuing for processing
โโโ Fallback to pull model during spikes
3. Platform-Specific Timeline Designsโ
3.1 Twitter Timeline Architectureโ
Home Timeline Designโ
Twitter Hybrid Approach:
โโโ Regular Users (<50K followers)
โ โโโ Fan-out on write to all followers
โ โโโ Store in Redis timeline cache
โ โโโ TTL-based expiration (7 days)
โ โโโ Async processing for non-active users
โโโ Popular Users (>50K followers)
โ โโโ Skip fan-out completely
โ โโโ Pull model for timeline generation
โ โโโ Cache popular user content separately
โ โโโ Merge during timeline assembly
โโโ Mixed Timeline Assembly
โโโ Base timeline from cache (fan-out users)
โโโ Pull celebrity content on-demand
โโโ Merge and rank by recency/relevance
โโโ Apply user-specific filters
Tweet Ranking Factorsโ
Twitter Timeline Ranking:
โโโ Temporal Score (40%)
โ โโโ Tweet timestamp (recent = higher score)
โ โโโ Engagement velocity (trending boost)
โ โโโ Real-time event correlation
โโโ Social Score (30%)
โ โโโ Retweets and quote tweets
โ โโโ Likes and replies ratio
โ โโโ Author credibility score
โ โโโ Network engagement (friends' interactions)
โโโ Relevance Score (20%)
โ โโโ User interest alignment
โ โโโ Topic/hashtag preferences
โ โโโ Historical engagement patterns
โ โโโ Language and geographic relevance
โโโ Quality Score (10%)
โโโ Spam/bot detection
โโโ Content authenticity
โโโ Media quality indicators
โโโ Advertiser-friendly content
3.2 Facebook News Feed Designโ
EdgeRank Algorithm Evolutionโ
Facebook Timeline Generation:
โโโ Relationship Scoring
โ โโโ Interaction frequency with content creator
โ โโโ Profile visits and photo tags
โ โโโ Message exchange history
โ โโโ Mutual friend connections
โโโ Content Type Weighting
โ โโโ Video content (highest priority)
โ โโโ Photo posts (high priority)
โ โโโ Link shares (medium priority)
โ โโโ Text status (lower priority)
โ โโโ Live video (temporary boost)
โโโ Recency Decay Function
โ โโโ Exponential decay over time
โ โโโ Slower decay for high-engagement posts
โ โโโ Boost for "evergreen" content
โ โโโ Time zone adjustment for users
โโโ Personalization Layer
โโโ Individual user behavior patterns
โโโ Content category preferences
โโโ Device and platform optimization
โโโ A/B testing variations
Facebook Celebrity Handlingโ
Facebook Page Posts (Celebrity/Brand):
โโโ Page Post Creation
โ โโโ Store in Page Content Database
โ โโโ Skip immediate fan-out to all followers
โ โโโ Analyze content for viral potential
โ โโโ Queue for selective distribution
โโโ Smart Distribution Strategy
โ โโโ Initial distribution to top 1% engaged followers
โ โโโ Monitor engagement rate in first hour
โ โโโ Expand distribution if high engagement
โ โโโ Throttle distribution if low engagement
โ โโโ Paid promotion integration for reach
โโโ Timeline Integration
โโโ Pull page content during feed generation
โโโ Compete with organic content for feed slots
โโโ Apply page-specific ranking adjustments
โโโ Balance organic vs promotional content
3.3 Instagram Timeline Designโ
Instagram Feed Architectureโ
Instagram Hybrid Timeline:
โโโ Following Feed (Chronological + Algorithmic)
โ โโโ Recent posts from followed accounts
โ โโโ Story highlights integration
โ โโโ Suggested posts insertion
โ โโโ Ad placement optimization
โโโ Discover Feed (Algorithm-driven)
โ โโโ Content from non-followed accounts
โ โโโ Hashtag and location-based discovery
โ โโโ Influencer content promotion
โ โโโ Shopping integration
โโโ Stories Timeline (Ephemeral)
โโโ 24-hour TTL content
โโโ Chronological ordering by posting time
โโโ Close friends priority
โโโ Interactive elements (polls, questions)
Instagram Celebrity Strategyโ
Instagram Influencer Architecture:
โโโ Creator Account Classification
โ โโโ Regular users: Standard fan-out
โ โโโ Creators (>10K): Selective fan-out
โ โโโ Verified accounts: Pull model
โ โโโ Business accounts: Paid reach model
โโโ Content Distribution Tiers
โ โโโ Tier 1: Immediate delivery to close connections
โ โโโ Tier 2: Gradual rollout to engaged followers
โ โโโ Tier 3: Algorithmic distribution to broader audience
โ โโโ Tier 4: Explore page featuring for viral content
โโโ Engagement-based Amplification
โโโ Monitor early engagement signals
โโโ Boost high-performing content
โโโ Reduce reach for low-engagement posts
โโโ Creator bonus programs for viral content
3.4 LinkedIn Feed Designโ
Professional Content Timelineโ
LinkedIn Feed Architecture:
โโโ Professional Relevance Scoring
โ โโโ Industry and job function alignment
โ โโโ Skill overlap with user profile
โ โโโ Company and educational connections
โ โโโ Professional level matching
โโโ Content Type Prioritization
โ โโโ Original long-form articles (highest)
โ โโโ Industry insights and analysis
โ โโโ Career updates and achievements
โ โโโ Professional networking posts
โ โโโ Job postings and opportunities
โโโ Network Amplification
โ โโโ 1st degree connections (highest visibility)
โ โโโ 2nd degree mutual connections
โ โโโ Industry leader content
โ โโโ Company page updates
โ โโโ LinkedIn Learning integration
โโโ Time-sensitive Professional Content
โโโ Breaking industry news
โโโ Job application deadlines
โโโ Networking event announcements
โโโ Professional milestone celebrations
4. Timeline Storage Architectureโ
4.1 Storage Patternsโ
Push Model Storageโ
Timeline Cache Structure:
User Timeline Table:
โโโ user_id (partition key)
โโโ post_id (sort key)
โโโ timestamp
โโโ post_content_summary
โโโ author_info
โโโ ranking_score
โโโ ttl (expiration time)
Characteristics:
โโโ High storage cost (duplicated data)
โโโ Fast read performance
โโโ Complex write operations
โโโ Storage grows with user connections
Pull Model Storageโ
Post Storage:
User Posts Table:
โโโ author_id (partition key)
โโโ post_id (sort key)
โโโ timestamp
โโโ content
โโโ metadata
โโโ engagement_metrics
User Connections Table:
โโโ user_id (partition key)
โโโ following_user_id (sort key)
โโโ connection_type
โโโ connection_timestamp
โโโ relationship_strength
Characteristics:
โโโ Low storage overhead
โโโ Single source of truth
โโโ Complex read queries
โโโ Real-time data consistency
4.2 Hybrid Storage Strategyโ
Mixed Storage Approach:
โโโ Hot Timeline Cache (Redis)
โ โโโ Last 100 posts per user
โ โโโ TTL: 24-48 hours
โ โโโ Fast read access
โ โโโ Memory-optimized
โโโ Warm Timeline Storage (Cassandra)
โ โโโ Last 1000 posts per user
โ โโโ TTL: 30 days
โ โโโ SSD-based storage
โ โโโ Moderate read performance
โโโ Cold Post Archive (S3/HBase)
โ โโโ All historical posts
โ โโโ Permanent storage
โ โโโ Slow access (batch queries)
โ โโโ Cost-optimized storage
โโโ Celebrity Content Cache
โโโ Dedicated celebrity post storage
โโโ Global replication
โโโ High availability
โโโ Specialized indexing
5. Real-Time Timeline Updatesโ
5.1 Live Update Architectureโ
Real-Time Update Flow:
New Post/Interaction โ Message Queue โ Timeline Update Service โ Push to Active Users
Components:
โโโ WebSocket Connections
โ โโโ Persistent connections for active users
โ โโโ Real-time post delivery
โ โโโ Typing indicators and live reactions
โ โโโ Connection management and scaling
โโโ Server-Sent Events (SSE)
โ โโโ One-way real-time updates
โ โโโ Timeline refresh notifications
โ โโโ New post availability alerts
โ โโโ Trending content notifications
โโโ Push Notifications
โ โโโ Mobile app notifications
โ โโโ Personalized content alerts
โ โโโ Social interaction notifications
โ โโโ Breaking news and viral content
โโโ Polling Fallback
โโโ Legacy client support
โโโ Network connectivity issues
โโโ Battery optimization
โโโ Graceful degradation
5.2 Event-Driven Updatesโ
Timeline Update Events:
โโโ Content Events
โ โโโ New post created
โ โโโ Post edited/updated
โ โโโ Post deleted/hidden
โ โโโ Content moderation actions
โโโ Social Events
โ โโโ New follower/connection
โ โโโ User mention in post
โ โโโ Post likes/reactions
โ โโโ Comments and replies
โ โโโ Shares and reposts
โโโ System Events
โ โโโ Algorithm updates
โ โโโ Trending content identification
โ โโโ Spam/abuse detection
โ โโโ Performance optimization triggers
โโโ External Events
โโโ Breaking news integration
โโโ Sports scores and results
โโโ Stock market updates
โโโ Weather and emergency alerts
6. Timeline Ranking & Personalizationโ
6.1 Machine Learning Pipelineโ
ML-Driven Timeline Ranking:
โโโ Feature Engineering
โ โโโ User behavior features (clicks, time spent, shares)
โ โโโ Content features (type, length, media quality)
โ โโโ Social features (author credibility, network engagement)
โ โโโ Contextual features (time, location, device)
โ โโโ Historical features (past interactions, preferences)
โโโ Model Training
โ โโโ Training data from user interactions
โ โโโ Multiple objective optimization (CTR, engagement, time spent)
โ โโโ A/B testing framework integration
โ โโโ Real-time model updates
โ โโโ Bias detection and mitigation
โโโ Inference Pipeline
โ โโโ Real-time scoring during timeline generation
โ โโโ Batch processing for pre-computation
โ โโโ Model serving infrastructure
โ โโโ Feature store integration
โ โโโ Performance monitoring
โโโ Feedback Loop
โโโ User interaction tracking
โโโ Model performance analysis
โโโ Automated model retraining
โโโ Human feedback integration
6.2 Personalization Strategiesโ
Individual Timeline Customization:
โโโ Interest Profiling
โ โโโ Topic modeling from user interactions
โ โโโ Hashtag and keyword preferences
โ โโโ Content category weighting
โ โโโ Temporal interest evolution
โโโ Social Graph Analysis
โ โโโ Close friend identification
โ โโโ Interest community detection
โ โโโ Influence network mapping
โ โโโ Echo chamber prevention
โโโ Behavioral Adaptation
โ โโโ Optimal posting time detection
โ โโโ Content format preferences
โ โโโ Engagement pattern analysis
โ โโโ Attention span optimization
โโโ Context Awareness
โโโ Device-specific optimization
โโโ Location-based content
โโโ Time-sensitive personalization
โโโ Mood and sentiment adaptation
7. Performance Optimizationโ
7.1 Caching Strategyโ
Multi-Layer Caching:
โโโ Browser Cache
โ โโโ Static assets (images, CSS, JS)
โ โโโ Recently viewed content
โ โโโ User session data
โ โโโ Offline content access
โโโ CDN Cache
โ โโโ Media files (photos, videos)
โ โโโ Popular content
โ โโโ Geographic distribution
โ โโโ Edge server optimization
โโโ Application Cache
โ โโโ User timeline cache (Redis)
โ โโโ Celebrity content cache
โ โโโ Trending topics cache
โ โโโ Search results cache
โโโ Database Cache
โ โโโ Query result caching
โ โโโ Connection pooling
โ โโโ Read replica caching
โ โโโ Index optimization
โโโ Smart Cache Invalidation
โโโ Event-driven cache updates
โโโ TTL-based expiration
โโโ User activity-based invalidation
โโโ A/B testing cache isolation
7.2 Database Optimizationโ
Timeline Database Design:
โโโ Partitioning Strategy
โ โโโ User-based partitioning for timelines
โ โโโ Time-based partitioning for posts
โ โโโ Geographic partitioning for global scale
โ โโโ Celebrity-specific partitioning
โโโ Indexing Strategy
โ โโโ Composite indexes for timeline queries
โ โโโ Sparse indexes for inactive users
โ โโโ Full-text search indexes
โ โโโ Geospatial indexes for location
โโโ Replication Strategy
โ โโโ Master-slave for read scaling
โ โโโ Multi-master for global writes
โ โโโ Cross-region replication
โ โโโ Consistency level tuning
โโโ Query Optimization
โโโ Query plan analysis
โโโ Batch query processing
โโโ Async query execution
โโโ Connection pooling
8. Scalability Patternsโ
8.1 Horizontal Scalingโ
Timeline Service Scaling:
โโโ Service Decomposition
โ โโโ Timeline generation service
โ โโโ Content ranking service
โ โโโ User graph service
โ โโโ Celebrity content service
โ โโโ Real-time update service
โโโ Load Distribution
โ โโโ User-based sharding
โ โโโ Geographic load balancing
โ โโโ Service mesh architecture
โ โโโ Auto-scaling policies
โโโ Data Partitioning
โ โโโ Consistent hashing for timelines
โ โโโ Range partitioning for posts
โ โโโ Celebrity data isolation
โ โโโ Cross-partition query optimization
โโโ Failure Handling
โโโ Circuit breaker patterns
โโโ Graceful service degradation
โโโ Fallback timeline strategies
โโโ Data consistency recovery
8.2 Global Distributionโ
Multi-Region Timeline Architecture:
โโโ Regional Data Centers
โ โโโ User data locality
โ โโโ Content delivery optimization
โ โโโ Regulatory compliance
โ โโโ Disaster recovery
โโโ Content Replication
โ โโโ Celebrity content global replication
โ โโโ Viral content rapid distribution
โ โโโ Regional content preferences
โ โโโ Language-specific content
โโโ Cross-Region Consistency
โ โโโ Eventually consistent timelines
โ โโโ Strong consistency for critical operations
โ โโโ Conflict resolution strategies
โ โโโ Data synchronization optimization
โโโ Network Optimization
โโโ Edge server placement
โโโ Content delivery networks
โโโ Protocol optimization (HTTP/2, QUIC)
โโโ Mobile network adaptation
9. Analytics & Monitoringโ
9.1 Timeline Performance Metricsโ
Key Performance Indicators:
โโโ User Experience Metrics
โ โโโ Timeline load time (<200ms target)
โ โโโ Content freshness (time to see new posts)
โ โโโ Scroll performance (60fps target)
โ โโโ Engagement rates (clicks, shares, time spent)
โโโ System Performance Metrics
โ โโโ Timeline generation throughput
โ โโโ Database query performance
โ โโโ Cache hit/miss ratios
โ โโโ API response times
โ โโโ Error rates and availability
โโโ Business Metrics
โ โโโ Daily active users
โ โโโ Session duration
โ โโโ Content consumption rates
โ โโโ Ad placement effectiveness
โ โโโ Revenue per user
โโโ Infrastructure Metrics
โโโ Server resource utilization
โโโ Database connection pools
โโโ Network bandwidth usage
โโโ Storage costs and efficiency
โโโ Auto-scaling trigger events
9.2 A/B Testing Frameworkโ
Timeline Algorithm Testing:
โโโ Experiment Design
โ โโโ Control vs treatment groups
โ โโโ Statistical significance requirements
โ โโโ Business metric targets
โ โโโ Risk mitigation strategies
โโโ Implementation
โ โโโ User bucketing strategies
โ โโโ Feature flag systems
โ โโโ Gradual rollout mechanisms
โ โโโ Real-time monitoring
โโโ Analysis
โ โโโ Statistical significance testing
โ โโโ Multi-variate analysis
โ โโโ Cohort behavior analysis
โ โโโ Long-term impact assessment
โโโ Decision Making
โโโ Business impact evaluation
โโโ Technical debt assessment
โโโ User experience impact
โโโ Rollback procedures
Key Timeline Design Principlesโ
โ Hybrid Approach: Combine push/pull models based on user characteristics โ Celebrity Problem Solution: Separate handling for high-follower accounts โ Real-Time Updates: Event-driven architecture for live content โ ML-Driven Personalization: Algorithm-based content ranking โ Multi-Layer Caching: Optimize for read-heavy workloads โ Horizontal Scalability: Handle billions of users and posts โ Global Distribution: Low-latency access worldwide โ Performance Monitoring: Data-driven optimization and A/B testing
Bottom Line: Timeline design is the heart of social media platforms, requiring sophisticated algorithms to balance real-time updates, personalization, scalability, and user engagement while solving the unique challenges posed by celebrity accounts and viral content distribution.